In this notebook, we will present how to classify flower images by using transfer learning from a pre-trained network.

A network model that has been saved and earlier trained using a sizable dataset is known as a pre-trained model.

The concept behind transfer learning for image classification is that if we use a model that was trained on a really big, representative dataset, this model can serve as our basic model for categorising images. This allows to save a lot of training time by utilizing the feature maps.

Here, we're gonna test two approaches:

  1. feature extraction: We extract significant features from fresh data using representations learned by a network that has already undergone training. Simply adding a new classifier will allow us to use the feature maps that have already been learned for the dataset and will be trained on top of the pretrained model. We do not need to restart this trained network, of course. There are already features in the basic convolutional network that are generally helpful for classifying images. It should be noted that the pretrained model's ultimate classification component only considers the initial classification.

  2. fine tuning : This approach involves unfreezing a few of the top layers of the previously frozen model and jointly training the new top layer (to identify our particular datasets) and these final layers of the frozen model. We take this action because the first convolution layers of the base model only extract fundamental features (edges, vertical/horizontal lines, etc.), while the final layers of the base model extract top-level feature maps. By fine-tuning the base model's top feature representation, we can increase their specificity for our classification job.

I ) Introduction

II ) Data

III ) CNN model

IV ) Model evaluation

V ) Conclusion

I) Introduction

In this notebook, we'll make use of a much complex architecture with the MobileNetV2 model. It has been developed in 2018, MobileNetV2 is a significant improvement over MobileNetV1 and pushes the state of the art for mobile visual recognition including classification, object detection and segmentation. If you're interested, you can find more details in the release paper.

II) Data

2.1 - Load & explore data

Let's check how many of each species of flowers are present.

Now, we're gonna load the different images and transform them into numpy arrays.

2.2 - Label encoding

Labels are the 5 species number (from 0 to 4). Thus, we need to encode these labels to one-hot vectors. For instance, an image of a sunflower should have a label 3 and a corresponding y = [0,0,0,1,0].

2.3 - Split training and validation set

Here, we're gonna split our dataset into a training, a validation and a testing one. This ensures that there are no bias: the model is trained on images with known labels, then we test our model accuracy on the validation dataset on images that our model did not see before. Finally, we compute the accuracy on the test dataset.

III) CNN model

3.1 About the optimizer and learning rate

When our model will be built, we need to specify an accuracy function, a loss function and an optimisation algorithm.

The accuracy function is used to evaluate the performance of the model.

The loss function is used to measure how the model performs on data with known labels. It tells us how poorly the model performs in a supersised system. For multi-label classification, we make use of a specific loss function called as categorical_crossentropy (similar to cross-entropy in maths).

Finally, the optimizer function is used in order to minize the loss function by changing model parameters (weighs values, filters kernel values etc.).

For this classification problem, we choose the RMSprop optimizer which is very efficient and commonly used (more details on the optimizers on Keras here).

Since deep networks can take quiet a time for the optimizer to converge, we're gonna use an annealing method of the learning rate (LR).

The LR is basically the step by which the optimizer is 'walking'. A hight LR correspond to big steps and thus the convergence is faster. However, in that case the sampling is not really efficient since the optimizer do not fall especially in the right minima.

At the opposite, have a low LR means that the optimizer will probably find the right local minima but it will take a lot of time.

The idea here is to start from a low value but not so low and then decrease the LR along the training to reach efficiently the global minimum of the loss function. Using the ReduceLROnPlateau method , we are able to choose to reduce the LR by a coefficient (here 75%) if the accuracy has not improved after a number of epochs (here 3).


In addition, we use the EarlyStopping method to control the training time: if the accuracy has not improved after 5 epochs we stop.

Finally we make use of the ModelCheckpoint which is useful for monitoring the best found weights during the training.

3.2 Define the model

For now, we're doing feature extraction i.e. we freeze the convolutional base (MobileNet). Then, we add a classifier on top of it and train this top-level classifier.

Now, we need to generate predictions from the block of features, average over the spatial locations, using a GlobalAveragePooling2D layer to convert the features to a single 1280-element vector per image. Finally, we'll some regular Dense layer with a final one with 5 units corresponding to each species of flower.

Note that only ~ 6000 parameters will be trained, the other ~2.2M from the MobileNetV2 model were already trained.

3.3 - Data augmentation

A useful trick to ovoid any overfitting is to use data augmentation. What is that? Well, the idea is to add artificially data into our dataset. But of course not any data, we alter the dataset with tiny transformations to reproduce very similar images.

For instance, we rotate of a few degree an image, we de-center it or we zoom in or out a little bit. These common augmentation techniques are horizontal/vertical flips, rotations, translations, rescaling, random crops, adjust brightness and more.

Thanks to these transformations, we can get bigger dataset (x2, x3 in size) and then train our model in a much robust way.

3.4 Feature extraction

3.5 Fine tuning

It is now time for the fine tuning of our model: we're gonna unfreeze some of the top layers of the base model and train all those and the top layer classifier.

Continue training the model

Great ! We can really see that fine-tuning is working and improve the accuracy of our model. We note also that the validation loss tends to increase a bit a the end: to prevent an eventual overfitting situation, we could add the EarlyStopping function in the callbacks during the training.

IV) Model evaluation

Indeed we have now an 90% accuracy on the test dataset (compared to 84% before fine tuning) !

Confusion matrix

 Prediction vizualisations

let's have a look at some predictions !

Conclusion

We can note the improvement of the model predictions by doing some fine-tuning. Of course, we can complexify the model by playing with the hyperparameters and/or adding other layers on top of the top-less MobileNetV2 base model such as several Dense layers with some Dropout or BatchNormalization ones between (to avoid overfitting). Feel free to test some different architectures to improve the accuracy of the predictions.

It will be interesting also to compare this fine-tuning method with others models such as VGG16, VGG19, ResNet50 etc.